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Approximate Bayesian computation (ABC) have become an essen- 
tial tool for the analysis of complex stochastic models. Grelaud et 
al. (2009, Bayesian Ana 3:427-442) advocated the use of ABC for 
model choice in the specific case of Gibbs random fields, relying on 
a inter-model sufficiency property to show that the approximation 
was legitimate. We implemented ABC model choice in a wide range 
of phylogenetic models in the DIY-ABC software (Cornuet et al. 
(2008) Bioinfo 24:2713-2719). We now present arguments as to 
why the theoretical arguments for ABC model choice are missing, 
since the algorithm involves an unknown loss of information induced 
by the use of insufficient summary statistics. The approximation 
error of the posterior probabilities of the models under comparison 
may thus be unrelated with the computational effort spent in run- 
ning an ABC algorithm. We then conclude that additional empirical 
verifications of the performances of the ABC procedure as those 
available in DIYABC are necessary to conduct model choice. Q 
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retical results that mathematically validate model choice for 
insufficient statistics are currently lacking on a general basis. 

Our conclusion at this stage is to opt for a cautionary 
approach in ABC model choice, handling it as an exploratory 
tool rather than trusting the Bayes factor approximation. The 
corresponding degree of approximation cannot be evaluated, 
except via Monte Carlo evaluations of the model selection 
performances of ABC. More empirical measures such as those 



proposed in the DIY-ABC software ( Cornuet et al. 2008 \ and 



in Ratmann et al. ( 2009 1 thus seem to be the only available 



solution at the current ti me for condu c ting model c omparison. 

We stress that, while Templeton (2008 20101 repeatedly 
expressed reservations about the formal validity of the ABC 
a pproach in statist i cal te s ting, those criticisms were rebutted 



Beaumont et al.|p010| ; Csillery et al. ( ^2010b^ ; Berger et al. 
and are not relevant for the current paper. 



Abbreviations: ABC, approximate Bayesian computation; ABC-MC, ABC model 
choice; DIY-ABC, Do-it-yourself ABC; IS, importance sampling; SMC, sequential 
IVlonte Carlo 

Inference on population genetic models such as coalescent 
trees is one representative example of cases when statistical 
analyses like Bayesian inference cannot easily operate because 
the likelihood function associate d with the data cannot be 
computed in a manageable time (Tavare et al. 11997 Beau- 
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reason for this impossibility is that the model associated with 
coalescent data has to integrate over trees of high complexity. 

In such settings, tr aditional approximation tools like 
Monte Carlo simulation ( [Robert and Casella[ [2004 [ ) from the 
posterior distribution are unavailable for practical purposes. 
Indeed, due to the complexity of the latent structures defin- 
ing the likelihood (like the coalescent tree), their simulation 
is too unstable to bring a reliable approximation in a man- 
ageable time. Such complex models call for a pra ctical if 
cruder approximation method, the ABC methodology ( |Tavare| 
et al. |l997j pri tchard ct al.[ |i999| . This rejection technique 
bypasses the computation of the likelihood via simulations 

and 



from the corresponding distribution (see Beaumont 2010 ; 
Lopes and Beaumont[ |2010| for recent surveys, and |Csillery| 



et al.[ |2010a| for the wide and successful array of applications 
based on implementations of ABC in genomics and ecology). 

We argue here that ABC is a generally valid approxima- 
tion method for doing Bayesian inference in complex models. 
However, without further justification, ABC methods cannot 
be trusted to discriminate between two competing models 
when based on insufficient summary statistics. We exhibit 
simple examples in which the information loss due to insuffi- 
ciency leads to inconsistency, i.e. when the ABC model selec- 
tion fails to recover the true model, even with infinite amounts 
of observation and computation. On the one hand, ABC using 
the entire data leads to a consistent model choice decision but 
it is clearly infeasible in most settings. On the other hand, 
too much information loss due to insuffiency leads to a statisti- 
cally invalid decision procedure. The challenge is in achieving 
a balance between information loss and consistency. Theo- 



Statistical Methods 

The ABC algorithm. The setting in which ABC operates is the 
approximation of a simulation from the posterior distribution 
n{9\y) oc n{9)f{y\0) when distributions associated with both 
the prior tt and the likelihood / can be simulated (the later 
being unavailable in clos ed form). The fir st ABC algorithm 
was introduced by ,Pritchard et al. ( 1999 1 as follows: given 
a sample y from a sample space T>, a sample {6i, . . . , 6m) is 
produced by 

Algorithm 1: ABC sampler 

for i = 1 to A'^ do 
repeat 

Generate 0' from the prior distribution 7r(-) 

Generate z from the likelihood ,f{-\0') 
until p{r?(z),r;(y)} < e 
set e, = 0', 
end for 

The parameters of the ABC algorithm are the so-called 
summary statistic ?;(•), the distance p{-, ■}, and the tolerance 
level e > 0. The approximation of the posterior distribution 
7r(0|y) provided by the ABC sampler is to instead sample 
from the marginal in of the joint distribution 



^.(0,z!y) 



^(0),f(z|0)U,,^(z) 



where Ib(-) denotes the indicator function of B and 

= {zGO|p{r?(z),77(y)}<e}. 

The basic justification of the ABC approximation is that, 
when using a sufficient statistic r) and a small (enough) toler- 
ance e, we have 

7r,(6>ly) = j 7r,(0,zly)dz^7r(01y). 



■'^CPR, JMM and NSP designed and performed research, JMC and JMM analysed data, and CPR, 
JMC and JMM wrote the paper. 
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In practice, the statistic rj is necessarily insufficient (since 
only exponenti al families enjoy suffi cient statistics with fixed 
dimension, see Lehmann and Casella, 1998) and the approxi- 
mation then converges to the less informative 7r(0|77(y)) when 
e goes to zero. This loss of information is a necessary price 
to pay for the access to computable quantities and 7r(0jr;(y)) 
provides a convergent infere nce on 9 when is identifiabl e 
in the distribution of r;(y) ( [Fearnhead and Prangle[ |2010[ ). 
While acknowledging the gain brought by ABC in handling 
Bayesian inference in complex models, and the existence of in- 



volvcd summary selectio n mech anisms ( Joyce and Marjoram 
[2008 Nuncs and Balding. |2010| l, we demonstrate here that the 
loss due to the ABC approximation may be arbitrary in the 
specific setting of Bayesian model choice via posterior model 
probabilities. 

ABC model choice. The standard Ba yesian tool for model 
comparison is the marginal likelihood ( Jeffreys 1939 1 



^(y) = / A0).f{y\ 

Je 



0) d0. 



which leads to the Bayes factor for comparing the evidences 
of models with likelihoods /i(yi0i) and f2{y\02), 



BMy) = 



m(y) ^ /ei M0i)My\Oi)d0i 
My) ^ /e^7r2(6»2)/2(y|6>2)d6l2 



As detailed in Beaumont et al.| ( [2010 1 , it provides a valid cri- 
terion for model comparison that is naturally penalised for 
model complexity. 

Bayesian model choice proceeds by creating a probability 
structure across M models (or likelihoods). It introduces the 
model index M as an extra unknown parameter, associated 
with its prior distribution, Tr{A4 — m) (m = 1, . . . , M), while 
the prior distribution on the parameter is conditional on the 
value m of the M index, denoted by Hm{0m) and defined on 
the parameter space O™,- The choice between those models is 
then driven by the posterior distribution of M , 

^{M = mjy) = -K^M = m)it;„(y)/^ 7r(X = k)wk{y) 

k 

where it)fc(y) denotes the marginal likelihood for model k. 

While this posterior distribution is straightforward to in- 
terpret, it offers a challenging computational conundrum in 
Bayesian analysis. When the likelihood is not available, ABC 



represents the almost unique solution. |Pritchard et al. |l999 1 
describe the use of model choice based on ABC for distin- 
guishing between different mutation models. The justification 
behind the method is that the average ABC acceptance rate 
associated with a given model is proportional to the posterior 
probability corresponding to this approximative model, when 
identical summary statistics, distance, and tolerance level are 
used over all models. In practice, an estimate of the ratio of 
marginal likelihoods is given by the ratio of observed accep- 
tance rates. Using Bayes formula, estimates of the posterior 
probabilities are straightforward to derive. This appro ach has 
been widely implemented in the literature (see, e.g 



Estoup 



etaLl 2004, Mi ller et al.|p005||Pasc"ual et al.[ [20071 and |Sain- 
udiin et al.| [2011 1. 

A representative illustr ation of the use of an ABC model 
choice approach is given by Miller et al. ( 2005 1 which analyses 
the European invasion of the western corn rootworm. North 
America's most destructive corn pest. Because this pest was 
initially introduced in Central Europe, it was believed that 
subsequent outbreaks in Western Europe originated from this 
area. Based on an ABC model choice analysis of the genetic 
variability of the rootworm, the authors conclude that this 



belief is false: There have been at least three independent in- 
troductions from North America during the past two decades. 

The above estimate is improved by regression regularisa- 
tion (Fagundcs ct al. , 2007), where model indices are pro- 
cessed as categorical variables in a polychotomous regression. 
When comparing two models, this involves a standard logis- 
tic regress ion. Rejection- b ased a pproaches were l ately i ntro- 
duced by ICornuet et all 12008]), IGrelaud et al.l ([20091 and 



Toni et al.| (|^uuy||, m a ivionte uarlo simulation ot moael m 
dices as well as model parameters. Those recent extensions 
are already widely u sed in population g enetics, as exempli- 
fied by Belle et al.| ([2008|); [Cornue t et al. (2010); Excoffier[ 
et al. (2009); Ghirotto ct al. (2010); Guillcmaud ct al. (2009|iJ 
neucnbcrgcr and Wcgmann (2010); Patin ct al. (2009); Ra- 
makrishnan and Hadly (2009); Vcrd u et al.| (|2009); We emann 
and Excoffior (2010). Another illustration oT the popularity 
of this approach is given by the availability of four softwares 
implementing ABC model choice methodologies: 

• ABC-SysBio, which relies on a SMC-based ABC for infer- 



ence in system biology, including model-choice ( Toni et al. 
2009). 

ABCToolbox which proposes SMC and MCMC implemen- 
tations ,_as_well as Bayes factor approximation ( Wcgmann] 



[20111 

be: 



et al 
DIYl: 

on populati on history using molecular markers (Cornuet 



which relies on a regularised ABC-MC a lgorithm 



et 



'^[20b8| ) 

FopABC, which relies o n a regular ABC-M C algorithm for 



genealogical simulation ( Lopes et al. 2009 1 



As ex pos ed in e.g. Grelaud et al. (20091, Toni and Stumpf 
( 2010| , or |Didelot et al.| ( |2011| ), once M is incorporated within 
the parameters, the ABC approximation to its posterior fol- 
lows from the same principles as in regular ABC. The corre- 
sponding implementation is as follows, using for the summary 
statistic a statistic r/(z) = {?7i(z), . . . ,r;M(z)} that is the con- 
catenation of the summary statistics used for all models (with 
an obvious elimination of duplicates). 

Algorithm 2: ABC-MC 

for i = 1 to do 
repeat 

Generate m from the prior n^M — m) 

Generate 0m from the prior nm{0m) 

Generate z from the model fm{z\0,n) 
until p{T7(z),f?(y)} < e 
Set m*^' = m and 9'^^^ = 0m 
end for 

The ABC estimate of the posterior probability ■k{A4 = 
mjy) is then the frequency of acceptances from model m in 

the above simulation 7r(A^ — m\y) — X^ili Im(»)=m • 

This also corresponds to the frequency of simulated pseudo- 
datasets from model m that are closer to the data y than the 
tole rance e. In order to improve the estimation by smooth- 
ing, Cornuet et al. ( 2008 1 follow the rationale that motivated 
the use of a local linear regression in Beaumont et al. ( 2002 1 



and rely on a weighted polychotomous regression to estimate 
n{A4 — m\y) based on the ABC output. This modelling is 
implemented in the DIYABC software. 



The difficulty with ABC-MC. There is a fundamental discrep- 
ancy between the genuine Bayes factors/posterior probabili- 
ties) and the approximations resulting from ABC-MC. 
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The ABC approximation to a Bayes factor, B12 say, re- 
suhing from Algorithm 2 is 



Bi2(y) = AM = 2)J2^^iM=i/AM = 1)^1 



An alternative representation is given by 



BMy) 



n{M = 2) ELi- 



T7T,* — 1 p{t7{z* ) ,T7(y) } 



m*=2 V{»7{z*),77(y)}<e 



where the pairs (m*, 2*) are simulated from the joint prior and 
T is the number of simulations necessary for A'' acceptances 
in Algorithm 2. In order to study the limit of this approxima- 
tion, we first let T go to infinity. (For simplification purposes 
and without loss of generality, we choose a uniform prior on 
the model index.) The limit of -612 (y) is then 



nM = l,pMz),rj{y)}<e] 



¥[M = 2,p{riiz),rj{y)}<e] 

//llp{r,(z),-,j(y)}<.7ri(6>l)/l(z|6>l)dz d6>i 
//Ip{,7(^),r7(y)}<e'r2(02)/2(z|6'2)dz d6»2 

JJl,ir,My)}<^MOi)f?iri\0i)dride, 
nip{r,My)}<^MO2)f^{Tl\02)drid02 ' 

where fi{ri\9i) and /2'(^|^2) denote the densities of ri{z) 
when z ~ /i(z|0i) and z ~ /2(z|02), respectively. By 
L'Hospital formula, if e goes to zero, the above converges to 

sr2(y)= / MSi)ff{v(y)\0i)<i6^/ [ MO2)f5{v{y)\O2)d02 



namely the Bayes factor for testing model 1 versus model 2 
based on the sole observation of rj{y). This result reflects 
the current perspective on ABC: the inference derived from 
the ideal ABC output when e = only uses the information 
contained in r;(y). Thus, in the limiting case, i.e. when the al- 
gorithm uses an infinite computational power, the ABC odds 
ratio does not account for features of the data other than the 
value of T7(y), which is why the limiting Bayes factor only 
depends on the distribution of t/ under both models. 

When running ABC for point estimation, the use of an 
insufficient statistic does not usuall y jeopardise convergence 
of the method. As shown, e.g., in (Fearnhead and Prangle 



2010 Theorem 2), the noisy version of ABC as an inference 



method is convergent under us ual regularity con ditions for 
model-based Bayesian inference (Bernardo and Smith, lOOT), 
including identifiability of the parameter tor the insufficient 
statistic rj. In contrast, the loss of information induced by 
T] may seriously impact model-choice Bayesian inference. In- 
deed, the information contained in ?7(y) is lesser than the 
information contained in y and this even in most cases when 
ri{y) is a sufficient statistic for both models. In other words, 
T/(y) being sujficient for both fi{y\0i) and f2{y\02) does not 
usually imply that 'r]{y) is sujficient for {ni, fm{y\0m)} ■ To 
see why this is the case, consider the most favourable case, 
namely when r7(y) is a sufficient statistic for both models. 



We then hav e by the factorisation theorem (Lehmann and 
Casella|p98l) that /,(y|0O = g^{y)fl'(rj{y)\0-}(i = l,'2),l 



Bi2iy) 



My) ^ /e,^(ei)gi(y)/r(^(y)|ei)dei 

My) ' Se,A02)g2(y)f^{ri{y)\02)d02 
gi(y) J^i(ei)A''(r?(y)|ei)dei 

32 (y) /^2(02)/2''(r?(y)|02)d02 

31 (y) 



32 (y) 



5^2 (y). 



[1] 



Thus, unless gi{y) = 172 (y) , as in the special case of Gibbs ran- 
dom fields detailed below, the two Bayes factors differ by the 
ratio <7i (y)/32(y), which is only equal to one in a very small 
number of known cases. This decomposition is a straightfor- 
ward proof that a model-wise sufficient statistic is usually not 
sufficient across models, hence for model comparison. An im- 
mediate corollary is that the ABC-MC approximation does 
not always converge to the exact Bayes factor. 

The discrepancy between limiting ABC and genuine 
Bayesian inferences does not come as a surprise, because ABC 
is indeed an approximation method. Users of ABC algorithms 
are therefore prepared for some degree of impreci sion i n their 



final answer, a point stres sed by Wilkinson] ( 2008 1 and Fearn 
[head and Prangle] ( |2010[ ) when they qualify ABC as exact 
inference on a wrong model. However, the magnitude of the 
difference between -Bi2(y) and BJ'gCy) expressed by [[T| is such 
that there is no direct connection between both answers. In a 
general setting, if 77 has the same dimension as one component 
of the n components of y, the ratio ffi(y)/<?2(y) is equivalent 
to a density ratio for a sample of size 0(n), hence it can be 
arbitrarily small or arbitrarily large when n grows. Contrast- 
ingly, the Bayes factor -6^2 (y) is based on an equivalent to a 
single observation, hence does not necessarily converge with 
n to the correct limit, as shown by the Poisson and normal 
examples below and in SI. The conclusion derived from the 
ABC-based Bayes factor may therefore completely differ from 
the conclusion derived from the exact Bayes factor and there 
is no possibility of a generic agreement between both, or even 
of a manageable correction factor. This discrepancy means 
that a theoretical validation of the ABC-based model choice 
procedure is currently missing and that, due to this absence, 
potentialy costly simulation-based assessments are required 
when calling for this procedure. 

Therefore, users must be warned that ABC approxima- 
tions to Bayes factors do not perform as standard numerical 
or Monte Carlo approximations, with the exception of Gibbs 
random fields detailed in the next section. In all cases when 
(?i(y)/(72(y) differs from one, no inference on the true Bayes 
factor can be derived from the ABC-MC approximation with- 
out further information on the ratio gi{y)/g2{y), most often 
una vailable in sett i ngs w here ABC is necessary. 



Didelot et al. 



^ (2011| also derived this relation between 

both Bayes factors in their formula [18]. While they still ad- 
vocate the use of ABC model choice in the absence of sufficient 
statistic, we stress that no theoretical guarantee can be given 
on the validity of the ABC approximation to the Bayes factor 
and hence of i ts use as a model ch oice procedure. 

(2009| resort to full allelic distri- 
work, 



Note that Sousa et al 
butions in an ABC frame 
statistics 



instead of chosing summary 
They show how to apply ABC using allele frequen- 
cies to draw inferences in cases where selecting suitable sum- 
mary statistics is difficult (and where the complexity of the 
model or the size of dataset prohibits to use full-likelihood 
methods). In such settings, ABC-MC does not suffer from 
the divergence exhibited here because the measure of distance 
does not involve a reduction of the sampl e. The same com- 
ment applies to the ABC-SysBio software of |Toni et a l. (2009|), 
which relies on the whole dataset. The theoretical validation 
of AB C inference in hidden Markov models by |Dean et al.| 
(2011 1 should also extend to the model choice setting because 
the approach does not rely on summary statistics but instead 
on the whole sequence of observations. 



Results 
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The specific case of Gibbs random fields. In an a pparent con- 
tradiction with the above, [Grelaud at al.| | |2009[ ) showed that 
the computation of the posterior probabihties of Gibbs ran- 
dom fields under competition can be done via ABC tech- 
niques, which provide a converging approximation to the true 
Bayes factor. The reason for this result is that, for these 
models in the above ratio gi{y) = <72(y)- The validation 
of an ABC comparison of (Jibbs random fields is thus that 
their specific structure allows for a sufficient statistic vector 
that runs across models and therefore leads to an exact (when 
e = 0) simulation from the posterior probabilities of the mod- 
els. Each Gibbs random field model has its own sufficient 
statistic (•) and Grelaud et al. (20091 exposed the fact that 
the vector of statistics ?7(-) = (77i(-),. . . ,77m(')) ^-l^o suffi- 
cie nt for the joint param eter {M,di, . . . , Om). 



of 



Didelot et al. (20111 point out that this specific property 
iibbs random fields can be extended to any exponential 



family (hence to any setting with fixed-di mension sufficient 



statistics, see 
is that, by inc 



Lehmann and Casella[ 19981. Their argument 
uding all sufficient statistics and all dominating 
measure statistics in an encompassing model, models under 
comparison are submodels of the encompassing model. The 
concatenation of those statistics is then jointly sufficient across 
models. While this encompassing principle holds in full gen- 
erality, in particular when comparing models that are already 
embedded, we think it leads to an overly optimistic perspec- 
tive about the merits of ABC for model choice: in practice, 
most complex models do not enjoy sufficient statistics (if only 
because they are beyond expone ntial families). The Gibbs 
case processed by Grelaud et al. ( 2009^ therefore happens to 
be one of the very few realistic counterexamples. As demon- 
strated in the next section and in the normal example in SI, 
using insufficient statistics is more than a mere loss of infor- 
mation. Looking at what happens in the limiting case when 
one relies on a common model-wise sufficient statistic is a for- 
mal but useful study since it brings light on the potentially 
huge discrepancy between the ABC-based and the true Bayes 
factors. To develop a solution to the problem in the formal 
case of the exponential families does not help in understanding 
the discrepancy for non-exponential models. 
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Fig. 1. Comparison between the true log-Bayes factor (first axis) for the com- 
parison of a Poisson model versus a negative binomial model and of the log-Bayes 
factor based on the sufficient statistic (second axisj, for Poisson (left) and 

negative binomial (left) samples of size n = 50, based on T = 10^ replications 



Arbitrary ratios. The difficulty with the discrepancy between 
Si2(y) and _BJ^2(y) is that this discrepancy is impossible to 
evaluate in a general setting, while there is no reason to ex- 
pect a reasonable agreement betwe en both quant ities. A first 
illustration was produced by jMarin et al. (20111 in the case 
of MA(g) models. 

A simple illustration of the discrepancy due to the use of 
a model- wise sufficient statistic is a a sample y = (j/i, . . . , 
that could come either from a Poisson P(A) distribution or 
from a geomet r ic Q{p ) distribution, already introduced in 
Grelaud et al. (20091 as a counter -example to Gibbs r an- 
dom fields and later reprocessed in |Didel ot et al.| ( |2011[ ) to 
support their sufficiency argument. In this case, the sum 
S — X]r=i ~ '7(y) is ^ sufficient statistic for both models 
but not across models. The distribution of the sample given 
S is a multinomial M{S, 1/n, . . . , 1/n) distribution when the 
data is Poisson, while it is the uniform distribution over the 
y's such that - yi = S in the geometric case, since S is then 
a negative binomial A/'e(;(n,p) variable. The discrepancy ratio 
is therefore 



5i(y)/ff2(y) = 



-5'- 
S 



S\n-yi[y, 



When simulating n Poisson or geometric variables and us- 
ing prior distributions as exponential A ~ f (1) and uniform 
p ~ W(0, 1) on the parameters of the respective models, the 
exact Bayes factor is available and the distribution of the dis- 
crepancy is therefore available. Figfllgives the range of -Bi2(y) 
versus -Bj'2(y), showing that -Bj^2(y) is then unrelated with 
Si2(y): the values produced by both approaches have noth- 
ing in common. As noted above, the approximation -6^2 (y) 
based on the sufficient statistic S is producing figures of the 
magnitude of a single observation, while the true Bayes factor 
is of the order of the sample size. 

The discrepancy between both Bayes factors is in fact in- 
creasing with the sample size, as shown by the following result; 
Lemma 1. Consider model selection between model 1: 'P(A) 
with prior distribution 7ri(A) equal to an exponential dis- 
tribution and model 2: Q(p) with a uniform prior distribution 
TT2 when the observed data y consists of iid observations with 
expectation E[yi] = > 0. Then S{y) — X^ILi min- 
imal sufficient statistic for both models and the Bayes factor 
based on the sufficient statistic S{y), _B^2(y)> satisfies 

lim B^^(^y) = ^0-^(^0 + ife'"" a.s. 

n—^oo 

Therefore, the Bayes factor based on the statistic S{y) is 
not consistent; it converges to a non-zero, finite v alue. 

In this specific setting, Didelot et al. (20111 show that 



adding P — YiiUi^- to the S creates a statistic (o, P) that is 
sufficient across both models. While this is mathematically 
correct, it does not provide further understanding of the be- 
haviour of ABC-model choice in realistic settings: outside for- 
mal examples as the one above and well-structured if complex 
exponential families like Gibbs random fields, it is not possi- 
ble to devise completion mechanisms that ensure sufficiency 
across models, or even select well-discriminating statistics. It 
is therefore more fruitful to study and detect the diverging 
behaviour of the ABC approximation as given, rather than 
attempting at solving the problem in a specific and formal 



Population genetics. We recall that ABC has first been in- 



2002 



troduced by populati on geneticists ( Beaumont et ah] 
Pritchard et al. 1999[ ) for statistical inference about trie evo- 
lutionary history of species, because no likelihood-based ap- 
proach existed apart from very simple and hence unrealistic 
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situations. This approach ha s since been used in an increasing 
number of biological studies (Esto up et al.[ |2004[ [Estoup and| 



Clegg 
model 



2003j [Fagundcs ct al. , 2007|7 most of them including 
choice. It is therefore crucial to get insights in the valid- 



ity of such studies, particularly when they de al with species of 



econo mical or ecological importance (see, e.g., Lombaert et al.[ 
20101. To this end, we need to compare ABC estimates of 



posterior probabilities to reliable likelihood-based estimates. 
Comb ining different modules based on |Stephens and Donnelly] 
( |2000[ ), it is possible to approximate the likelihood of popu- 
lation genetic data through importance sampling (IS) even 
in complex scenarios. In order to evaluate the potential dis- 
crepancy between ABC-based and likelihood-based posterior 




Fig. 2. Comparison of IS and ABC estimates of t hie posterior probability of scenari 
1 in the first population genetic experiment, using 24 summary statistics 




mportance Sampling 



Fig. 3. Same caption as Fig|2]when using 15 summary statistics 
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Fig. 4. Comparison of IS and ABC estimates of the posterior probability of scenario 
1 in the second population genetic experiment 



probabilities of evolutionary scenarios, we designed two ex- 
periments based on simulated data with limited information 
content, so that the choice between scenarios is problematic. 
This setting thus provides a wide enough set of intermediate 
values of model posterior probabilities, in order to better eval- 
uate the divergence between ABC and likelihood estimates. 

In the first experiment, we consider two populations (1 
and 2) having diverged at a fixed time in the past and a third 
population (3) having diverged from one of those two popu- 
lations (scenarios 1 and 2 respectively). Times are set to 60 
generations for the first divergence and to 30 generations for 
the second divergence. One hundred pseudo observed datasets 
have been simulated, represented by 15 diploid individuals per 
population genotyped at five independent microsatellite loci. 
These loci are assumed to evolve according to the strict Step- 
wise Mutation model (SMM), i.e. when a mutation occurs, the 
number of repeats of the mutated gene increases or decreases 



Fig. 5. Boxplots of the posterior probabilities evaluated over 10 independent 
Monte Carlo evaluations, for five independent simulated datasets in the first popula- 
tion genetic experiment 



Fig. 6. Boxplots of the posterior probabilities evaluated over 10 independent 
Monte Carlo evaluations, for five independent simulated datasets in the second pop- 
ulation genetic experiment 
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Tab. SI Summary statistics used in the population genetic experiments, the Subset column corresponding to the ABC operated with 
15 summary statistics and the last three statistics being only used in this reduced collection 

Name Subset Definition 



M A 1 1 


yes 


M A 1 n 

NAL2 


yes 


M A 1 Q 


yes 


Hb 1 1 


yes 


Hb 1 1 


yes 


Hb 1 i 


yes 


\ /A D 1 

VAKl 


yes 


\ /A D'l 

VAKz 


yes 


\ /A Do 

VAKj 


yes 


^ /I /~\ A/1 
MbWi 


no 


IVIvjVvz 


no 




no 


FSTl 


no 


FST2 


no 


FST3 


no 


LIK12 


no 


L1K13 


no 


LIK21 


no 


LIK23 


no 


LIK31 


no 


LIK32 


no 


DAS12 


yes 


DAS13 


yes 


DAS23 


yes 


Dl\/1212 


yes 


Dl\/1213 


yes 


Dl\/1223 


yes 



average number of alleles in population 1 
average number of alleles in population 2 
average number of alleles in population 3 
average heterozygothy n population 1 
average heterozygothy n population 2 
average heterozygothy n population 3 
average variance of the allele size in population 1 
average variance of the allele size in population 2 
average variance of the allele size in population 3 
Garza-Williamson M in population 1 
Garza-Williamson M in population 2 
Garza-Williamson M in population 3 
average FST in population 1 
average FST in population 2 
average FST in population 3 
probaoility that sample 1 is from population 1 
probability that sample 1 is from population 3 
probability that sample 2 is from population 1 
probability that sample 2 is from population 3 
probability that sample 3 is from population 1 
probability that sample 3 is from population 2 
shared allele distance between populations 1 and 2 
shared allele distance between populations 1 and 3 
shared allele distance between populations 2 and 3 
distance (<5;i)^ between populations 1 and 2 
distance between populations 1 and 3 
distance (4^)^ between populations 2 and 2 



by one unit with equal probability. The mutation rate, com- 
mon to all five loci, has been set to 0.005 and effective pop- 
ulation sizes to 30. In this experiment, both scenarios have 
a single parameter: the effective population size, assumed to 
be identical for all three populations. We chose a uniform 
prior (7[2, 150] for this parameter (the true value being 30). 
The IS algorithm was performed using 100 coalescent trees per 
particle. The marginal likelihood of both scenarios has been 
computed for the same set of 1000 particles and they provide 
the posterior probability of each scenario. The ABC compu- 



tations have been performed with DIYABC (Cornuet et al. 
[2008). A reference table of 2 million datasets has been sim 
mated using 24 usual summary statistics (provided in Table 
SI) and the posterior probability of each scenario has been 
estimated as their proportion in the 500 simulated datasets 
closest to the pseudo observed one. This population genetic 




Fig. 7. Comparison between two approximations of tlie posterior probabilities 
of scenario 1 based on importance sampling with 50,000 particles (first axis) and 
ABC (second axis) for the larger population genetic experiment 



setting does not allow for a choice of sufficient statistics, even 
at the model level. 

The second experiment also opposes two scenarios includ- 
ing three populations, two of them having diverged 100 gen- 
erations ago and the third one resulting of a recent admixture 
between the first two populations (scenario 1) or simply di- 
verging from population 1 (scenario 2) at the same time of 5 
generations in the past. In scenario 1, the admixture rate is 
0.7 from population 1. Pseudo observed datasets (100) of the 
same size as in experiment 1 (15 diploid individuals per popu- 
lation, 5 independent microsatellite loci) have been generated 
for an effective population size of 1000 and mutation rates 
of 0.0005. In contrast with experiment 1, analyses included 
the following 6 parameters (provided with corresponding pri- 
ors): admixture rate ((7 [0.1, 0.9]), three effective population 
sizes (i7[200, 2000]), the time of admixture/second divergence 
([/[1, 10]) and the time of the first divergence {U[50, 500]). To 
account for an higher complexity in the scenarios, the IS algo- 
rithm was performed with 10,000 coalescent trees per particle. 
Apart from this change, both ABC and likelihood analyses 
have been performed in the same way as experiment 1. 

Fig [2] shows a reasonable fit between the exact posterior 
probability of model 1 (evaluated by IS) and the ABC approx- 
imation in the first experiment on most of the 100 simulated 
datasets, even though the ABC approximation is biased to- 
wards 0.5. When using 0.5 as the decision boundary between 
model 1 and model 2, there is hardly any discrepancy between 
both approaches, demonstrating that model choice based on 
ABC can be trusted in this case. Fig [3] considers the same set- 
ting when moving from 24 to 15 summary statistics (given in 
Table SI): the fit somehow degrades. In particular, the num- 
ber of opposite conclusions in the model choice moves to 12%. 
In the more complex setting of the second experiment, the dis- 
crepancy worsens, as shown on Fig|4] The number of opposite 
conclusions reaches 26% and the fit between both versions of 
the posterior probabilities is considerably degraded, with a 
correlation coefficient of 0.643. 
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The validity of the importance sampling approximation 
can obviously be questioned in both experiments, however 
Fig [5] and Fig [6] display a strong stability of the posterior 
probability IS approximation across 10 independent runs for 
5 different datasets and gives proper confidence in this ap- 
proach. Increasing the number of loci to 50 and the sample 
size to 100 individuals per population (see SI) leads to pos- 
terior probabilities of the true scenario overwhelmingly close 
to one (Fig[7|, thus bluring the distinction between ABC and 
likelihood based estimates but also reassuring on the ability 
of ABC to provide the right choice of model with a higher 
information content of the data. Actually, we note that, for 
this experiment, all ABC-based decisions conclude in favour 
of the correct model. 



Discussion 

Since its introduction by [Tavare et al.| ( |1997[ ) and [Pritchard] 
et al. (19991, ABC has been extensively used in several areas 
involving complex likelihoods, primarily in population genet- 
ics, both for point estimation and testing of hypotheses. In 
realistic settings, with the exception of Gibbs random fields, 
which satisfy a resilience property with respect to their suf- 
ficient statistics, the conclusions drawn on model comparison 
cannot be trusted per se but require further simulations anal- 
yses as to the pertinence of the (ABC) Bayes factor based 
on the summary statistics. This paper has examined in de- 
tails only the case when the summary statistics are sufficient 
for both models, while practical situations imply the use of 
insufficient statistics. The rapidly increasing number of appli- 
cations estimating posterior probabilities by ABC indicates a 
clear need for further evaluations of the worth of those esti- 
mations. 

Further research is needed for producing trustworthy ap- 
proximations to the posterior probabilities of models. At this 
stage, unless t he whole dat a is inv olved in the ABC approx- 

our conclusion on ABC- 



imation as in 



Sousa et al. 



( |2009| ), 

based model choice is to exploit the approximations in an 
exploratory manner as measures of discrepancies rather than 
genuine posterior pro babilities. This directi on relates with 

Furthermore, 



2009 



the analyses found in Ratmann et al. 
a version of this explorato ry analysis is a lready provided in 
the DIY-ABC software of ( [Cornuet et al.[ [20F8). An option 



in this software allows for the computation of a Monte Carlo 
evaluation of false allocation rates resulting from using the 
ABC posterior probabilities in selecting a model as the most 
likely. For instance, in the setting of both our population ge- 
netic experiments, DIY-ABC gives false allocation rates equal 
to 20% (under scenarios 1 and 2) and 14.5% and 12.5% (un- 
der scenarios 1 and 2), respectively. This evaluation obviously 
shifts away from the performances of ABC as an approxima- 
tion to the posterior probability towards the performances of 
the whole Bayesian apparatus for selecting a model, but this 
nonetheless represents a useful and manageable quality assess- 
ment for practitioners. 
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Appendix: SI Results 

A normal illustration. The following reproduces the Poisson 
geometric illustration in a normal model. If we look at a fully 
normal A/'(/i, (7^) setting, we have 

/(y|/i) oc exp |-7i(T"^(y - fif /2 - Y^V^ ~ vf 
hence 

f{y\y) oc exp|-cr"^^(j/i - cr""Ij^y.=„s . 

If we reparameterise the observations into u = (j/i — 
y,... ,y„-i - y,y), we do get 

-2/- n2 



X exp 



exp{-ncr (j/-^) /2} 

( n-l 



1=1 



/2 



since the Jacobian is 1. Hence 

iii-i 
-o-"^ ^ lii /2 - , 



_ i 


= 1 


"n-l 






'-1 


.! = 1 





mean -70500 




65 70 75 80 85 -200000 -100000 

Fig. 8. Empirical distributions of the log discrepancy log (y)/gi2 (y) for 
datasets of size n = 15 simulated from A/'(/i, crj) (left) and J\f{fi, 0-2) (right) 
distributions when ai = 0.1 and a2 = 10, based on 10"^ replications and a flat 
prior 
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Considering both models 

J/i, • ■ • ,2/n ~ A/'(^,o-i) and yi, . . . ,y„'^ JV{fi,al) , 
the discrepancy ratio is then given by 



IT exp . 



and is connected with the lack of consistency of the Bayes 
factor: 

Lemma 2. Consider model selection between model 1: Af{fJ,, Oj) 
and model 2: J\f{fJ,, o"!), (Ji and a2 being given, with prior dis- 
tributions •7ri(/i) = 7r2(/i) equal to a Af{0,a'^) distribution and 
when the observed data y consists of iid observations with fi- 
nite mean and variance. Then S{y) = X^iLi minimal 
sufficient statistic for both models and the Bayes factor based 
on the sufficient statistic S{y), B^2{y)i satisfies 

lim -8^2(7) = 1 a.s. 

n— >oo 

Fig [8] illustrates the behaviour of the discrepancy ratio 
when (Ti =0.1 and a2 ~ 10, for datasets of size n = 15 simu- 
lated according to both models. The discrepancy (expressed 
on a log scale) is once again dramatic, in concordance with 
the above lemma. 

If we now turn to an alternative choice of sufBcient statis- 
tic, using the pair {y,S'^) with 



we follow the solution of Didelot et al. (2011 1. Using a conju- 
gate prior fi ~ Af{0, a^), the true Bayes factor is equal to the 
Bayes factor based on the corresponding distributions of the 
pair (j7, S^) in the respective models. Therefore, with sufh- 
cient computing power, the ABC approximation to the Bayes 
factor can be brought arbitrarily close to the true Bayes fac- 
tor. However, this coincidence does not bring any intuition on 
the behaviour of the ABC approximations in realistic settings. 



Larger experiment. We also considered a more informative 
population genetic experiment with the same scenarios (1 and 
2) as in the second experiment. One hundred datasets were 
simulated under scenario 1 with 3 populations, i.e. 6 param- 
eters. We take 100 diploid individuals per population, 50 loci 
per individual. This thus corresponds to 300 genotypes per 
dataset. The IS algorithm was performed using 100 coalescent 
trees per particle. The marginal likelihood of both scenarios 
has been computed for the same set for both 1000 particles 
(ISl) and 50,000 particles (IS2). A national cluster of 376 
processors (including 336 Quad Core processors) was used for 
this massive experiment (which required more than 12 calen- 
dar days for the importance sampling part). 

The confidence about the IS approximation can be as- 
sessed on Fig[7j which shows that both runs most always pro- 
vide the same numerical value, which almost uniformly is very 
close to one. This makes the fit of the ABC approximation 
to the true value harder to assess, even though we can spot 
a trend towards under-estimation. Furthermore, they almost 
all lead to correctly select model 1. 
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